最近,基于CNN的RGB-D显着对象检测(SOD)在检测准确性方面取得了显着提高。但是,现有模型通常在效率和准确性方面表现良好。这阻碍了他们在移动设备以及许多实际问题上的潜在应用。在本文中,为了弥合RGB-D SOD的轻质和大型模型之间的准确性差距,一个有效的模块可以极大地提高准确性,但提出了很少的计算。受深度质量是影响准确性的关键因素的启发,我们提出了有效的深度质量启发的功能操纵(DQFM)过程,该过程可以根据深度质量动态滤波深度特征。提出的DQFM求助于低级RGB和深度特征的对齐,以及深度流的整体注意力,以明确控制和增强交叉模式融合。我们嵌入了DQFM,以获得一个称为DFM-NET的有效的轻质RGB-D SOD模型,此外,我们还设计了一个定制的深度骨架和两阶段解码器作为基本零件。 9个RGB-D数据集的广泛实验结果表明,我们的DFM-NET优于最近的有效型号,在CPU上以约20 fps的速度运行,仅8.5mb型号大小,同时快2.9/2.4倍,比6.7/3.1倍,小于6.7/3.1倍最新的最佳型号A2DELE和手机。与非效率模型相比,它还保持最先进的准确性。有趣的是,进一步的统计数据和分析验证了DQFM在没有任何质量标签的各种品质的深度图中的能力。最后但并非最不重要的一点是,我们进一步应用DFM-NET来处理视频SOD(VSOD),与最近的有效模型相比,相当的性能,而比该领域的先前最佳状态的速度/2.3倍/小2.3倍。我们的代码可在https://github.com/zwbx/dfm-net上找到。
translated by 谷歌翻译
深度可以为显着对象检测(SOD)提供有用的地理线索,并已证明对最近的RGB-D SOD方法有所帮助。但是,现有的视频显着对象检测(VSOD)方法仅利用时空信息,很少利用深度信息进行检测。在本文中,我们提出了一个深度合并的三峰网络,称为VSOD的DCTNet,这是一项开创性的工作,旨在合并深度信息以帮助VSOD。为此,我们首先从RGB框架中生成深度,然后提出一种方法来不平等地处理这三种方式。具体而言,多模式注意模块(MAM)设计为对主模态(RGB)和两个辅助模态(深度,光流)之间的多模式远程依赖性建模。我们还引入了一个细化融合模块(RFM),以抑制每种模式中的噪音,并动态选择有用的信息以进行进一步的优化。最后,在精制特征以实现最终的跨模式融合后采用了渐进式融合策略。五个基准数据集的实验证明了我们的深度合并模型与12种最先进方法的优越性,并且还验证了深度的必要性。
translated by 谷歌翻译
伪装的对象检测(COD)旨在检测周围环境的类似模式(例如,纹理,强度,颜色等)的对象,最近吸引了日益增长的研究兴趣。由于伪装对象通常存在非常模糊的边界,如何确定对象位置以及它们的弱边界是具有挑战性的,也是此任务的关键。受到生物视觉感知过程的启发,当人类观察者发现伪装对象时,本文提出了一种名为Errnet的新型边缘的可逆重新校准网络。我们的模型的特点是两种创新设计,即选择性边缘聚集(SEA)和可逆的重新校准单元(RRU),其旨在模拟视觉感知行为,并在潜在的伪装区域和背景之间实现有效的边缘和交叉比较。更重要的是,RRU与现有COD模型相比,具有更全面的信息。实验结果表明,errnet优于三个COD数据集和五个医学图像分割数据集的现有尖端基线。特别是,与现有的Top-1模型SINET相比,ERRNET显着提高了$ \ SIM 6%(平均电子测量)的性能,以显着高速(79.3 FPS),显示ERRNET可能是一般和强大的解决方案COD任务。
translated by 谷歌翻译
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
translated by 谷歌翻译
A noisy training set usually leads to the degradation of the generalization and robustness of neural networks. In this paper, we propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels. Specifically, we first present a Scalable Penalized Regression (SPR) method, to model the linear relation between network features and one-hot labels. In SPR, the clean data are identified by the zero mean-shift parameters solved in the regression model. We theoretically show that SPR can recover clean data under some conditions. Under general scenarios, the conditions may be no longer satisfied; and some noisy data are falsely selected as clean data. To solve this problem, we propose a data-adaptive method for Scalable Penalized Regression with Knockoff filters (Knockoffs-SPR), which is provable to control the False-Selection-Rate (FSR) in the selected clean data. To improve the efficiency, we further present a split algorithm that divides the whole training set into small pieces that can be solved in parallel to make the framework scalable to large datasets. While Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline, we further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data. Experimental results on several benchmark datasets and real-world noisy datasets show the effectiveness of our framework and validate the theoretical results of Knockoffs-SPR. Our code and pre-trained models will be released.
translated by 谷歌翻译
As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
translated by 谷歌翻译
Medical image segmentation (MIS) is essential for supporting disease diagnosis and treatment effect assessment. Despite considerable advances in artificial intelligence (AI) for MIS, clinicians remain skeptical of its utility, maintaining low confidence in such black box systems, with this problem being exacerbated by low generalization for out-of-distribution (OOD) data. To move towards effective clinical utilization, we propose a foundation model named EvidenceCap, which makes the box transparent in a quantifiable way by uncertainty estimation. EvidenceCap not only makes AI visible in regions of uncertainty and OOD data, but also enhances the reliability, robustness, and computational efficiency of MIS. Uncertainty is modeled explicitly through subjective logic theory to gather strong evidence from features. We show the effectiveness of EvidenceCap in three segmentation datasets and apply it to the clinic. Our work sheds light on clinical safe applications and explainable AI, and can contribute towards trustworthiness in the medical domain.
translated by 谷歌翻译
Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen-Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.
translated by 谷歌翻译
State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 1.6$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 1.3B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.
translated by 谷歌翻译
In this paper, we study the \underline{R}obust \underline{o}ptimization for \underline{se}quence \underline{Net}worked \underline{s}ubmodular maximization (RoseNets) problem. We interweave the robust optimization with the sequence networked submodular maximization. The elements are connected by a directed acyclic graph and the objective function is not submodular on the elements but on the edges in the graph. Under such networked submodular scenario, the impact of removing an element from a sequence depends both on its position in the sequence and in the network. This makes the existing robust algorithms inapplicable. In this paper, we take the first step to study the RoseNets problem. We design a robust greedy algorithm, which is robust against the removal of an arbitrary subset of the selected elements. The approximation ratio of the algorithm depends both on the number of the removed elements and the network topology. We further conduct experiments on real applications of recommendation and link prediction. The experimental results demonstrate the effectiveness of the proposed algorithm.
translated by 谷歌翻译